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[57] ABSTRACT 

A method and implementing multiprocessor computer sys- 
tem 200 in which graphics applications 101 are executed in 
conjunction Avith a graphics interface 103 to graphics hard- 
ware 115. The methodology is also apphcable to an imple- 
menting distributed network system. A master thread 105, or 
master node in a distributed network system, receives com- 
mands from a graphics application 101 and assembles 313 
the commands into workgroups with an associated work- 
group control block 315 and a synchronization tag 317. For 
each workgroup, the master thread flags changes in the 
associated workgroup control block. At the end of each 
workgroup, the master thread copies the changed attributes 
into the associated workgroup control block 319. The work- 
group control blocks are scanned 403 by the rendering 
threads, or rendering node in a distributed network system, 
and unprocessed workgroups are locked 406, and the ren- 
dering threads attribute state is updated 413 from the pre- 
vious workgroup control blocks. Once the rendering thread 
has updated its attributes, it has the necessary state to 
independently process the workgroup, thus allowing parallel 
execution. A synchronizer thread reorders the graphics 
datastream, created by the rendering threads, using the 
synchronization tags and sequentially sends the resultant 
data to the graphics hardware 115. 

19 Claims, 4 Drawing Sheets 
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GRAPHICS INTERFACE PROCESSING multiprocessing environment. Thus there is a need to pro- 

METHODOLOGY IN SYMMETRIC vide a methodobgy and apparatus which efficiently exploits 

MULTIPROCESSING OR DISTRIBUTED a multiprocessor environment to optimize performance of an 

NETWORK ENVIRONMENTS "OpenGL" or similar graphics interface system. 

5 

HELD OF THE INVENTIGN SUMMARY OF THE INVENTION 

The present invention relates generally to information A method and implementing multiprocessor computer 
processing systems and more particularly to an improved system in which graphics applications are executed in con- 
graphics processing method and apparatus for multiproccs- junction with a graphics interface to graphics hardware. This 
sor or distributed network computer graphics systems sup- method is also applicable to an implementing distributed 
porting an OpenGL or similar graphics programming inter- network system, llie master thread, or master node in the 
face. case of a distributed network system, receives primitive and 
. r™.^ ,.„^x™^v, attribute commands from a graphics application and 
BACKGROUND OF THE INVENTION ^^^les the commaDds into workgroups associated 

System graphics technologies are developing at increas- workgroup control blocks and synchronization tags. The 

ingly faster pace in order to keep up with the great demand master thread context is updated in accordance with graphics 

for graphics displays and visual enhancements for almost all attribute changes. For each workgroup, the master thread 

computer applications in many fields of endeavor. To a great flags such attribute changes in the associated workgroup 

extent, current developments are driven by increasing control block. Unchanged attributes are maintained from an 
demand for, and use of, computer-aided design (CAD) ^ initial attribute state. At the end of a workgroup, the master 

applications, computer-aided manufacturing (CAM) appli- thread copies the changed attributes into the workgroup 

cations and computer aided-engineering (CAE) tools, TTie control block. The workgroup control blocks are scanned by 

increasing sophistication of these applications and tools the rendering threads. When an unprocessed workgroup is 

requires faster and faster processing times for the applica- detected, it is locked, and the attribute state of the rendering 
tions and tools to remain useful. Also, the development of ^ thread, or the rendering node in the case of a distributed 

additional programming capabilities and enhanced visual network system, is updated from the previous workgroup 

effects creates additional demand for more expansive data control blocks. Once the rendering thread has updated its 

handling capabilities and faster system processing speeds. attributes, it has the necessary state to independently process 

In response to these demands, symmetric multiprocessor the workgroup, thus allowing parallel execution. The syn- 

(SMP) data processing systems have been employed to chronizer thread reorders the graphics datastream created by 

improve overall system performance and support enhanced the rendering threads, using the synchronization tags and 

graphics capabilities. In general, overall system perfor- sequentially sends the resultant datastream to the graphics 

mance is improved by providing multiple processors to hardware, 
allow multiple applications or programs to execute simul- 

taneously on the^me data or information processing sys- BRIEF DESCRIPTION OF THE DRAWINGS 

tem. In networics, the computer that may display the graph- The novel features beheved characteristic of the present 

ics created by a user, i.e. the server computer, may not be the invention are set forth in the claims. The invention itself, 

same computer upon which the drawing commands are however, as well as a prefened mode of use, further objec- 
crealed, i.e. the client computer. Such systems utihzing a ^ tives and advantages thereof, will best be understood by 

standard graphics application interface, such as the reference to the following detailed description of an illus- 

"OpenGL" graphics interface for example, can be imple- trative embodiment when read in connection with the 

mented on many different hardware platforms. However, accompanying drawings in which: 

^^^n^^^ accomplish parallel execution of a single ^ ^ ^ schematic representation of a graphics archi- 

OpenGL or smnlar graphics interface apphcation on a ^^^^^^^ accordance with the present invention; 

plurahty of processors have not been totally successful. ^ . , . r 

, , , , FIG. 2 is a simplified schematic drawing of a multipro- 

A number of difinculties must be overcome m order to . . • u - u *u . • r 

, ... . . . f . • . cessor computer system in which the present invention may 

build a system that outperforms a uniprocessor implemen- ^ lemented* 

tation. In a graphics parallel processing environment, each * 

thread running on an individual processor needs to be so ^ ^ flowchart illustrating a high level flow 

working constantly in order to obtain maximum system sequence of workgroup creation methodology disclosed 

performance. Each individual processor can be one of the herein, and 

processors in an SMP system or one of the nodes of a FIG. 4 is a flowchart illustrating workgroup selection and 

distributed network system. In addition, each thread typi- attribute updating methodology in accordance with the 
cally receives only a portion of a graphics datastream, yet 55 present invention. 

each thread needs access to the entire graphics datastream in „ ^ 

^ , . * . * -u . * * r: 11 DETAILED DESCRIPTION 

order to maintain correct attribute state. Further, all com- i^^mj-^i^k^ e^x^^^^m^,. Atv^i^ 

mands must be handled in sequential order to establish the In FIG. 1, there is shown a graphics application 101 which 

correct attribute slate. is typically running on a workstation or other computer 

Wait conditions arc problematical and cause system eo system. As hereinafter noted, an exemplary system may 

delays where individual threads must wait for all previous include a plurality of workstations or computers connected 

commands to be processed. Another common problem is the in a network configuration and having a common bus which 

latency incurred in starting and stopping a parallel pipeline. may include a plurahty of central processing units or CPUs 

Operations that cause a pipeUne to stop or be interrupted in a multiprocessing environment, and various display capa- 
must be avoided. 65 bihties. 

A graphics hardware interface system needs to be able to In sending graphics data and commands for display, a 

work efficiently for a variable number of processors in a graphics interface 103, for example " OpenGL'*, receives 
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primitive and attribute commands from application 101. A the beginning of the workgroup. The key pieces of the 

primitive defines the shape of various components of an workgroup control block are the pointers to the bundled 

object, such as lines, points, polygons, and text in two or primitive and attribute commands, the attribute change flags, 

three dimensions. An attribute defines a state such as the changed attribute state, the synchronization tag, and a 

Unestyle, color, surface texture, material or matrices. 5 lock. The lock is used to ensure that only one rendering 

In the present example, the graphics application 101 is thread may process the workgroup. The master thread sets 

coupled to the graphics interface 103 which interfaces the ^ information except for the lock, 

application 101 or applications to an implementing hardware The rendering threads 107,109 scan the fist of workgroup 

system 115 through a plurality of threads. A thread is a control blocks and lock the first unprocessed workgroup, so 

predefined program segment within a larger process or lo no other thread will process the same workgroup. Before 

program segment and is operable to effect the accomplish- processing can begin on the locked workgroup, the thread's 

ment of a specified individual graphics task such as raster- attribute state must correspond to the beginning of the 

izing or rendering. In the disclosed method, for a parallel locked workgroup, i.e. the attribute state as if this thread had 

processing environment, one of a plurality of threads will be processed all previous commands. To accomplish the acqui- 

a master thread 105 and is the thread through which the 15 sition of the required attribute state, the rendering thread 

application 101 communicates to the interface system 103. scans the list of workgroup control blocks in reverse order 

TTie master thread 105, within the graphics interface, creates from the workgroup it has just locked, updating its local 

a plurality of threads 107, 109, to be used for rendering. One attribute stale from the attributes that have been marked by 

thread is designated as the synchronizer thread HI which the flags in each of the workgroup control blocks. In the 

sorts the datastreams from all of the threads into sequential 20 process of scanning back, once an attribute is updated 

order and communicates the resultant datastream to the locally, the thread will not update that attribute again. The 

hardware 115. Between master thread 105 and synchronizer thread continues this process until all attributes have been 

thread 111 are connected, in parallel, a plurality of rendering updated and the thread reaches the last workgroup processed 

threads 113 such as thread 107 and thread 109. by this thread. 

Each thread maintains its own local graphics context ^ With the technique described above, only the most recent 

containing the attribute state. Master thread 105 includes a attribute changes are updated in the rendering thread's local 

local graphics context 106 associated therewith. Similarly, attribute state. The rendering threads do not incur delays 

threads 107, 109 and 111 include related graphics contexts associated with updating attributes every time attributes are 

108, 110 and 112, respectively, associated therewith. changed but rather only when individual threads require 

The thread designated as the master thread 105 operates access to the updated attributes does the updating process 

as a datastream distributor, receives graphics interface com- occur and then only with regard to the required attributes, 

mands from a graphics applications 101, and sequentially This method efficiently updates attributes needed by the 

bundles the primitive and attribute commands into work- rendering threads without having to process all previous 

groups for fiiture processing by a rendering thread. The workgroups. 

number of commands in each workgroup is based on the After the attributes have been updated, the thread marks 

number of vertices contained in the rendering commands, the workgroup control block as scanned by the thread. In 

and the number and size of attribute commands received and order for the workgroup control block to be reused by the 

the estimated amount of processing time for a workgroup. master thread, all of the rendering threads must mark the 

The sizes of the workgroups are crucial in balancing the ^ woricgroup control block as processed. The flagging of 

workload of the processors within a parallel system. attributes by the master thread and updating of the local state 

In the present example, the most frequently occurring by the rendering threads is a key element and enables the 

function calls such as "glColor**, "glNormal", "gllndex**, packeting of work for rendering threads, and also the ability 

"glEdgeflag", and "glTexCoord", are not executed immedi- of the rendering threads to work in parallel, 

ately upon receipt, but rather a pointer is stored to the 45 The rendering threads create a datastream contained in 

function call information in the workgroup, and at the end of queues which are direcUy sent to the graphics hardware 115. 

the packaging of the workgroup, the pointers are tested. If The datastream is created asynchronously between the 

any of the pointers are set, they are processed in their threads, since one rendering thread may be working faster or 

entirety at that time. That method saves processing the same slower than another. Each rendering thread has a set of 

function call many times during the workgroup when only 50 queues with associated headers containing information 

the last instance of each of the frequently occurring graphics about the queue and a synchronization tag. To accomplish 

interface function calls is needed. the desired ordering, the synchronizer thread 111 scans the 

Each graphics interface command from a user appUcation queue headers of all the rendering threads for the next 

101 is bundled sequentially for future work by the rendering synchronization tag. The resultant datastream is temporaUy 

threads. Each workgroup is distinguished by a synchroni- 55 ordered by the synchronizer thread 111 and sent to the 

zation tag which is used and referred to by the synchronizer graphics hardware 115 for proper rendering, 

thread 111 for sequential ordering of the datastream. For In FIG. 2, an exemplary system 200 is illustrated for 

each attribute command that is received, the master thread implementing the processing methods disclosed herein. The 

105 updates the state of the master graphics context 106, graphics subsystem 217 corresponds to the hardware 115 

flags the particular change, and places the command in a go block iUustrated in HG. 1. FIG. 2 depicts a simplified block 

workgroup. At the end of a workgroup, the master thread diagram of selected components in an information process- 

105 copies the attribute state that has changed within that ing or data processing system. The processing system 

workgroup from the master thread's graphics context 106 to includes a central processing unit (CPU) or processor 201 

a workgroup control block. connected to a central bus 203. A second processor 202 is 

Workgroup control blocks contain information needed by 65 also shown connected to the bus 203. The system may also 

the rendering threads 107, 109, to select the workgroup for include additional processors connected to the central bus 

processing and updating the thread's attributes to the state at 203. The illustrated system is an example of a symmetric 
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multiprocessor (SMP) architecture having a pluraUty of made as to whether there are unprocessed workgroups 405. 
processors servicing the system. Additionally, a plurality of When an iinprocessed workgroup is identified, that work- 
such systems oould be connected together to form a distrib- group is locked 406 and the attributes are updated xising the 
uted network system. Further, the central bus arrangement workgroup control blocks in reverse order 407 to obtain the 
illustrated in the present example may also be implemented 5 most recent attribute changes. If the previous workgroup 
in other arrangements including but not limited to a periph- control block had been scanned by the current thread 407, 
eral component interconnect (PCI) local bus. then the workgroup (WG) is rendered 408 and the process 
The exemplary processing system includes a memory returns to scan WG control blocks for unprocessed work- 
subsystem 205 and a cache memory 207 connected to the groups. If a previous WG control block was not scanned by 
bus 203. The memory subsystem typically includes a ^^lecunrent thread 407, a determination is made as to whether 
memory controller and system RAM memory. Also con- ^ ^ attribute change 411. When an attribute change is 
nected to the bus 203 is a storage block 215 which may ^^'^^'^^ ^l^' ^ .^^g '^^^^ ^ ^i'^'^'^.'^^u "^"^ ^ 
include one or more of several storage funcUon devices ^f.T."''-'"!! ""u"*' ^^1 u"" ^ ^^S^^ 
including but not limited to floppy disk drives, hard drives, aUributes in the workgroup have been updated. If there are 
, . * „ . , r*^-; . . . r J 'Mxn other attribute changes m the workgroup that have not been 
tape drives, flash memory, etc. An mput mterface device 209 15 updated, then the process repeats to update the changes 413 
applies inputs from one or more input devices^uch as a ^^^^ ^^^^^ ^^^^-j^^^^^ ^^^^ ^^^^ ^^^^^^ 4^5 
keyboard 211 and a mouse 213, to the bus 203. The system ^hat point, or if there are no additional attribute changes 
also includes a display device 219 which is connected detected 411, the workgroup is marked as scanned 417, The 
through a graphics subsystem 217 to the bus 203. The ^^^^^ ^peats the process until all previous workgroup 
graphics subsystem 217 typically includes an internal graph- 20 control blocks are marked as scanned. The rendering thread 
ICS processor as well as a frame buffer memory for use in is now ready to process the locked workgroup. The flagging 
connection with the display device. For example, the graph- 311 of attributes by the master thread 105 and the updating 
ics subsystem 217 generally includes rasterization hardware 413 of the local state by the rendering threads e.g. threads 
as well as other specific graphics engines. The bus 203 may 107 and 109, enables the packeting of work for the rendering 
be extended 221 to be connected to other system and/or 25 threads and also enable the rendering threads to work in 
station devices in a network or other configuration. Instruc- parallel. 

tions for performing the processes and methods of the Hie method and apparatus of the present invention has 

present invention may be executed by the processors 201 been described in connection with a preferred embodiment 

and 202 and/or a separate graphics processor within the as disclosed herein. Although an embodiment of the present 

graphics subsystem 217. Such instructions may be embodied 30 invention has been shown and described in detail herein, 

within or stored in any one of, or a combination of, storage along with certain variants thereof, many other varied 

devices and/or memory devices including RAM memory embodiments that incorporate the teachings of the invention 

within the memory subsystem 205, any of the possible "^^y ^^^ily constructed by those skilled in the art 

storage elements of the storage block 215 or any of a number Programmed mto system memones and/or transportable and 

of portable storage devices such as floppy disks or CDs. 35 T ^^^": ^ '''' Z ^ ^Z^^^ of systems, and/or 

^ a L . rr-i^ -in*..!. u • • also includcd or integrated mto a CPU or Other larger system 

The flowchart of FIG. 3 illustrates the graphics processmg . , . j ^ 1 u- u i_- u- 

J . , * ju * *u jiAe • 1 J- mtegrated circuit or functional chip such as a graphics chip 

methods as implemented by the master thread 105, including u-lj a j-i .u K 

t 1 t 1, >5Ai * *u J • or graphics board or subsystem. Accordingly, the present 

the creation of workgroups. Initially 301 a master thread is ^ J , . , . j / . t . j . .if ■/ c 

designated 303 as hereinbefore disaissed.Adetermmation is '^".fTu'^ i . f ^ TTf" 

J in.i * L .1. I-' J 1. u set forth herein, but on the contrary, it is mtended to cover 

made 304 as to whether any graphics commands have been 40 i. j ^ J - 1 . i_ 

^ , 1- 1 J ■ such alternatives, modifications, and eqmvalents, as can be 

generated. When a graphics application command is ui • 1 ^ j -^i.- *i. - •? j r *u 

^ , , J J • ■ j ^Affu .t. . J reasonably mcluded withm the spirit and scope of the 

detected, the command is received 305 by the master thread invention 

105, and a determination is made 307 as to whether an What is claimed is' 

attribute change is required for the particular command ia*ujc j • ^ c 

• J Tf -L . u • • J * J I- A method of processmg commands received from a 
received. If an attribute change is required, the master thread 45 1- * ■ i.- • . r 

* ^ iA£ • J.J <>nn J \ u * t, software application by a graphics interface, the graphics 
context 106 is updated 309 and the attribute change is . . ^ . . ^ , , *i . 

a J . I . 1 Li 1 u *L . interface being selectively operable to provide output datas- 

flaeged in a workgroup control block 311 by the master , ^ . i.- Tj l 

J .V f. L . L r L J c treams for apphcation to a graphics hardware subsystem, 

thread. After the attnbute change has been made, or ir no method com risin • 

attribute change is required 307, the master thread assembles . comprising. .... 

the attribute command into a workgroup 313 as herein 50 '^^^''''''^ commands from the software application by a 

bcforcdescribed.Adctcrminationisthenmadeasto whether "^^^^^^ ^^'^^"^ ^'^^^"^^ interface; 

an "END WORKGROUP" condition is true 314. If the updating a master thread context for attribute changes in 

workgroup (WG) is not ended, the process returns to detect commands; 

subsequent graphics commands 304. If the WG is to be assembhng the conunands into workgroups having asso- 

ended 314, the master thread then creates a workgroup ss ^^^^ workgroup control blocks; 

control block 315 and a synchronization tag 317 in accor- copying said attribute changes to said workgroup control 

dance with the order in which the workgroup was created. blocks; 

The master thread updates the changed attribute 319, if any, scanning said workgroup control blocks by rendering 

and awaits 304 the receipt of another graphics command threads whereby said rendering threads are updated 

from the application 101. When there are no more graphics eo with attribute changes; and 

commands such as when the application program has sending said output datastreams created by said rendering 

terminated, the illustrated process ends 323, threads to the graphics hardware subsystem. 

In FIG. 4, the methodology as implemented by the 2. The method as set forth in claim 1 wherein said 

rendering threads is illustrated, including functional descrip- assembling further includes marking synchronization tags in 

tions of rendering threads, workgroup selection and attribute 65 said workgroup control blocks, said synchronization tags 

updates. When a rendering thread is initiated 401 the work- being indicative of the sequence in which said commands 

group control blocks are scaimed 403 and a determination is were received. 
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3. The method as set forth in claim 2 and, after said 
scanning, said method further including: 

sequencing said output datastreams in accordance with 
said synchronization tags. 

4. The method as set forth in claim 3 wherein after said 
updating, said method further includes: 

creating a workgroup control block; and 
flagging said attribute changes in said workgroup control 
block. 

5. The method as set forth in claim 3 wherein after said 
scanning, said method further includes: 

locking said workgroup control blocks until after said 
rendering threads have been updated. 

6. The method as set forth in claim 5 wherein said 
rendering thread attributes are updated in reverse order from 
previous workgroup control blocks. 

7. The method as set forth in claim 2 wherein after said 
updating, said method further includes: 

creating a workgroup control block; and 
flagging said attribute changes in said workgroup control 
block. 

8. The method as set forth in claim 2 wherein after said 
scanning, said method further includes: 

locking said workgroup control blocks until after said 
rendering threads have been updated. 

9. The method as set forth in claim 8 wherein said 
rendering thread attributes are updated in reverse order from 
previous workgroup control blocks. 

10. The method as set forth in claim 1 wherein after said 
updating, said method further includes: 

creating a workgroup control block; and 
flagging said attribute changes in said workgroup control 
block. 

U. The method as set forth in claim 10 wherein after said 
scanning, said method further includes: 

locking said workgroup control blocks until after said 
rendering threads have been updated. 

12. The method as set forth in claim 11 wherein said 
rendering thread attributes are updated in reverse order from 
previous workgroup control blocks. 

13. The method as set forth in claim 1 wherein after said 
scanning, said method further includes: 

locking said workgroup control blocks until after said ^5 
rendering threads have completed processing of said 
workgroups. 

14. The method as set forth in claim 13 wherein said 
rendering thread attributes are updated in reverse order from 
previotis workgroup control blocks. 

15. A storage medium including machine readable indicia, 
said storage medium being selectively coupled to a reading 
device, said reading device being selectively coupled to 
processing circuitry, said reading device being selectively 
operable to read said machine readable indicia and provide 55 
program signals representative thereof, said program signals 
being effective to cause said processing circuitry to interface 



35 



40 



a software application with a graphics hardware subsystem 
associated with said processing circuitry, said program sig- 
nals being selectively operable to cause said processing 
circuitry to provide output data streams for application to 
said graphics hardware subsystem by performing the steps 
of: 

receiving commands ft-om the software application by a 
master thread using the graphics interface; 

updating a master thread context for attribute changes in 
said commands; 

assembling the commands into workgroups having asso- 
ciated workgroup control blocks; 

copying said attribute changes to said workgroup control 
blocks; 

scanning said workgroup control blocks by rendering 
threads whereby said rendering threads are updated 
with attribute changes; and 

sending said output datastreams created by said rendering 
threads to the graphics hardware subsystem. 

16. The medium as set forth in claim 15 wherein said 
mediiun comprises a magnetic diskette. 

17. The medium as set forth in claim 15 wherein said 
medium comprises a CD-ROM. 

18. An information processing system comprising: 
a plurality of processing circuits; 

a memory device for use in conjunction with said pro- 
cessing circuits, said memory device being selectively 
operable for storing a software apphcation; 

a bus system connecting said processing circuits and said 
memory device; 

a graphics hardware subsystem connected to said bus 
system; and 

an interface element, said interface element being selec- 
tively operable for receiving commands from said 
software application to provide output data streams for 
application to said graphics hardware subsystem, said 
interface element being further selectively operable for: 
updating a master thread context for attribute changes 

in said commands; 
assembling the commands into workgroups having 

associated workgroup control blocks; 
copying said attribute changes to said workgroup con- 
trol blocks; 

scanning said workgroup control blocks by rendering 
threads whereby said rendering threads are updated 
with attribute changes; and 

sending said output datastreams created by said ren- 
dering threads to the graphics hardware subsystem, 
said rendering threads being executed in parallel by 
said processing circuits. 

19. The information processing circuit as set forth in 
claim 18 wherein said interface element is a software 
interface. 
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